An Experiment in the Use of Tools for Computer-Assisted Abstracting

Timothy C. Craven
Graduate School of Library and Information Science,
The University of Western Ontario,
London, Ontario
Canada.

Abstract

Experimental subjects wrote abstracts of an article using a simplified version of the TEXNET abstracting assistance software. In addition to the full text, the 35 subjects were presented with either keywords or phrases extracted automatically. The resulting abstracts, and the times taken, were recorded automatically; some additional information was gathered by oral questionnaire. Results showed considerable variation among subjects, but 37% found the keywords or phrases "quite" or "very" useful in writing their abstracts. Statistical analysis failed to support several hypothesized relations: phrases were not viewed as significantly more helpful than keywords; and abstracting experience did not correlate with originality of wording, approximation of the author abstract, or greater conciseness. Results also suggested possible modifications to the software.

Introduction

Suggestions for purely automatic abstracting methods, as surveyed by Paice (1990) and Endres-Niggemeyer (1994), do not show immediate promise of totally superseding human effort. An appropriate short-term goal would seem to be a hybrid system, in which some tasks are performed by human abstractors and other tasks by software. The model of such hybrid abstracting with which this paper is concerned involves providing writers of conventional abstracts with various computerized tools to assist them.

The general aim of the research reported in part in this paper is the development of a prototype computerized abstractor's assistant. As a kind of writer's assistant, such a software package should encompass a simple word processor and other general writer's tools (Kozma, 1991). In addition, the package should integrate tools, such as an automatic extractor, related specifically to the task of abstracting.

Abstracting assistance features are being prototyped in a text network management system, known as TEXNET (Craven, 1988) (Craven, 1991b). Among other options in TEXNET, the abstractor can choose to be presented with full-text words that he or she is likely to want to extract verbatim. These are determined automatically on the basis of frequency, with stop-words being omitted. The abstractor can highlight words desired from this list, and they will automatically be inserted into the abstract in the order in which they were selected.

At the initial stage of development, attention was concentrated on keywords, rather than longer phrases, for two reasons. First, an earlier study (Craven, 1991a) had showed little use in abstracts of longer verbatim word sequences from full texts. Second, keyword extraction is a somewhat simpler task than phrase extraction, though methods for efficient phrase extraction do exist, as in INDEX (Jones et al., 1990), FASIT (Burgin & Dillon, 1992), CLARIT (Paijmans, 1993), and work reported by Fagan (1989).

Recently (Craven, in press) a phrase display option has also been added to TEXNET. Automatically selected phrases are displayed in a compact format that takes account of overlaps among them.

Purpose

The experiment described in this paper had several purposes: to evaluate certain general features of the TEXNET software for usability; to compare the phrase and keyword displays as aids to abstractors; to test various preliminary hypotheses about the relations between abstractors' backgrounds and their reactions to abstracting assistance software; and to obtain ideas for further features to be developed in TEXNET.

Hypotheses

The following hypotheses were among those to be tested in the experiment:

Methodology

Thirty-five paid subjects were drawn by advertisement within a university community. Each subject used a simplified version of TEXNET to write an abstract of an article.

Simplification of the package was undertaken in order to restrict subjects' choices for purposes of experimental control. The chief modification involved restricting the various kinds of displays presented to the full text, the abstract being written, and one other kind of display determined by the experimenter. Half of the subjects saw a display of keywords that occurred at least 8 times in the text; the other half saw a phrase display based on the same frequency threshold as the keyword display. Subjects were told only that these displays were computer-generated "suggestions".

Another simplification was to freeze the dimensions of the windows in which the three displays appeared; this was done mainly to avoid confusing novice subjects. Eliminated features also included spell-checking and various text-structuring capabilities. In addition, the package was modified so that the session would be terminated when the time limit was reached.

The resulting abstracts, and the times taken, were recorded automatically. Some additional information was gathered from the subjects by oral questionnaire: a short list of closed questions on subjects' backgrounds and reactions was followed by a single open-ended request for comments on the software, abstracting, and the experiment.

A letter of information provided to subjects defined an abstract as "a brief, objective, and accurate representation of the contents of a document"; this definition was derived from the ANSI standard for writing abstracts (ANSI, 1979). The letter also informed subjects that they were limited to no more than one hour to write the abstract and that the abstract should be no more than 250 words in length.

Before writing the abstract, each subject was given a brief demonstration of the software, entailing an oral review of an instruction sheet. The same instruction sheet was also available to the subject throughout the writing of the abstract.

A number of criteria were applied in choosing full texts for the experiments: (1) they must be readily available, with author abstracts, in ASCII code; (2) automatically reformatting them to fit into 40-character lines should not cause readability problems; (3) they should be scholarly but not excessively technical; (4) length should be approximately 2000 words.

The first selected text, from the area of education, showed the conventional parts of a scientific report: purpose, methodology, results, and conclusions. The second text, dealing with computer-mediated communication, fitted the pattern of a survey article.

For the experiments, each selected text was stripped of its title, authorship, abstract, references, and other peripheral elements, which were stored separately for future analysis.

The first text was used with 20 subjects, and the second text with 15.

SPSS for Windows (SPSS Inc., 1993) was used for statistical analysis. Questionnaire responses that fitted precoded categories were treated as ordinal values; novel responses were treated as missing values. Because of the number of ordinal variables and the irregular distributions of the interval variables, the Spearman correlation coefficient was generally employed to test relationships.

Results

Statistical analysis

A summary of the variables analyzed is given in Table 1. Correlation test results are presented in Table 2. Table 3 shows the range of responses when subjects were asked how useful they had found the "suggestions".

Overall, phrases did not perform significantly better than keywords in terms of perceived usefulness. When asked how good they thought their abstracts were, subjects again showed little difference between phrase and keyword displays.

The abstracts were analyzed to determine whether subjects presented with phrases tended to employ these phrases in their abstracts more than did the other subjects. The measure used was the number of occurrences of two consecutive words in the abstract that matched two consecutive words in the phrase display. Results suggested some of the expected effect: 7 of phrase group's, but none of the keywords group's, abstracts showed more than 8 matches. Nevertheless, the difference was not statistically significant, and the abstracts of most subjects in the two groups showed an undifferentiated scattering. The author abstracts themselves showed 3 and 5 matches respectively.

Preference for phrasing from the full text was measured as the proportion, out of all pairs of consecutive words in the abstract, that were found also in the full text. Values ranged from 27.4% to 94.1%. The full-text-word-pair densities for the document authors' abstracts, at 42.1% and 42.4% respectively, were slightly toward the low end. A correlation was found here with previous Microsoft Windows experience, though not a statistically significant one. Correlation with previous computerized editing experience was weak. Density of full text phrasing showed little relation to abstracting experience.

A measure of the percent of abstract words found in the full text ranged from 45% to 99%, with the document authors' own abstracts at 78% and 77% respectively. For the first document, this measure showed a significant relation to Microsoft Windows experience, and a marginally significant relation to computerized editing experience in general. The overall correlation for both documents was positive, but not statistically significant.

Similarity to phrasing in the author's own abstract was also measured by word pair density. Values ranged from 1.3% to 18.5%. Little relation was found with previous abstracting experience. Experience with Microsoft Windows showed significant correlation with echoing author-abstract phrasing for the first document, but not in general.

Percent of subject-abstract words found in the document author's abstract ranged from 12% to 43%. This again did not correlate with abstracting experience. It also did not correlate with Microsoft Windows experience.

Responses to the question "How easy was it for you to use the software?" correlated, but not significantly, with previous Microsoft Windows experience and did not correlate noticeably with previous computerized editing.

Responses to the question "How good do you think your abstract is?" showed a limited range: only one subject answered "not at all good", and none "very good". Coded responses correlated with experience with abstracting, but not significantly.

The negative correlation of experience with abstracting with length of abstract, measured in characters, was very weak. Nor did abstracting experience correlate with vocabulary diversity, as measured inversely by Simpson's l (Simpson, 1949).

Of 35 subject abstracts, 13 showed no spelling errors. As was to be expected, spelling error density correlated negatively with experience with computerized editing. It also correlated negatively with use of source-text phrases. There was a nonsignificant negative correlation with seconds remaining at completion.

Free-form comments

Space does not permit inclusion of a detailed analysis of subject comments here, but some highlights will be given.

Five subjects would have liked having the original on paper. Seven subjects commented favorably on the simultaneous display, but thirteen criticized the amount or shape of space given to the three windows: several mentioned wanting more space for either the full text or the abstract.

Ten subjects commented on problems with scrolling.

Four subjects targeted the fact that selected text remained selected even after being copied to the clipboard.

Free-form comments by subjects related to the phrase and keyword displays generally concurred with responses to the closed question on this topic.

Ten subjects noted problems with estimating length of texts. Six of these thought that a word count feature would be helpful. Four subjects commented on the lack of a spell checker.

Five of the 15 subjects who abstracted the survey article commented on difficulties in understanding it.

Four subjects made comments suggesting that they found the time of one hour too short.

Discussion

Statistical analysis

One of the major hypotheses was supported at a statistically significant level for the first document: subjects with computer editing experience were indeed more likely to copy from the full text. A major factor in the correlation between Windows experience and use of author-abstract phrases was no doubt the author's fairly substantial use of full-text wording in abstracting the article. Yet it is not clear why the same effect was not observed with the addition of the second document, where the author abstract had a similar use of wording from the full text.

Other hypotheses were favored by the data; but correlations were weaker, and statistical significance was not attained: when presented with phrases, subjects tended to incorporate them in their abstracts; subjects with more Windows experience tended to find the software easier; and more experienced abstractors tended to think that their abstracts were better.

Analysis showed several hypothesized relations to be almost nonexistent; namely, that between perceived usefulness of suggestions and inclusion of phrases rather than keywords; and those between abstracting experience on the one hand and approximation to the author's own abstract, writing shorter abstracts, and using more diverse vocabulary on the other.

Free-form comments

A large screen with a higher resolution seems definitely to be recommended for computer-assisted abstracting, at least if the display is to include the full text. Especially where such a screen is not available, the capability of adjusting window sizes, either automatically or on demand, is important.

As soon as subjects' difficulties with scrolling became apparent, the full package was reprogrammed to provide smoother-scrolling displays. The version used for the experiment was left unchanged to provide uniformity.

The lack of immediate feedback from the copy command, including the retention of text selection, certainly caused problems, as also reported by Nielsen (Nielsen, 1994).

Unlike the test software, the full package provides a single command to copy a selected text and paste it immediately into the abstract. This function might have been favorably received by some of the subjects. It does require, however, that the abstract insertion point be set correctly first.

An optional dynamic word-count display has since been introduced into the full package, for both abstract and full text. Some of subject's difficulties with estimating word counts were perhaps due to unfamiliarity with scroll bars.

A spell checker was available in the full version. It was not incorporated in the test version for several reasons, including its incompleteness. The full version was subsequently modified to make use of a spell-checker, such as WinSpell (R&TH, 1994), that runs as a separate application but can monitor keystrokes.

It appears that the second document was somewhat more difficult than the first. Apart from intrinsic reading ease, survey articles are likely to be harder to abstract by the copy and paste method.

Most subjects in fact finished before the hour was up, and the quickest completed in less than 40 minutes. The time limit thus appears not to have been excessively restrictive, though also not excessively generous. We may compare the times for Van Dijk and Kintsch's subjects to produce a 60-to-80-word abstract of a 1600-word story on a computer console (Van Dijk & Kintsch, 1985).

Further research

The current series of experiments is to be continued using additional source texts and aids already available in TEXNET. The effect of presenting abstractors with extracted sentences should certainly be investigated. Another sort of suggestion that is available in TEXNET is a list of general indicative formulas or of phrases commonly used in abstracting. The work of Tibbo (1992) indicates considerable variation in abstract content types between disciplines; thus, such a set of phrases should probably vary with the field, and a suitable set of phrases would need to be developed for the test collection.

More detailed analysis of the abstracts produced is possible. For example, Salager-Meyer (Salager-Meyer, 1991) studied basic content elements, order, and paragraphing in English-language medical abstracts. Kaplan and others (Kaplan et al., 1994) applied linguistic analysis to abstracts submitted for conference presentations and also divided them into accepted and non-accepted categories. Current plans call at least for having a sample of subjects' abstracts graded by an independent reviewer.

In studies involving think-aloud protocols (Endres-Niggemeyer, Waumans, & Yamashita, 1991), it has been noted that individuals use quite different approaches in writing abstracts. Differences in approach might correlate with properties of abstracts produced or with reactions to particular computerized tools.

Additional evaluative research should be undertaken by others: this should include research into the tools' efficiency and effectiveness in assisting in the real-life production of abstracts, as well as replication of the experimental situation with a variety of abstracts and abstractors. It is intended to make the software freely available to students and other interested individuals, for purposes of feedback and both formal and informal testing.

Availability

The TEXNET software referred to in this papers, as well as the simplified version used in the experiments, is written as a MicroSoft Windows application in Borland Pascal with Objects 7.0. Either source or executable code is available by sending a 5 1/4" or a 3 1/2" dual-density diskette to the author: both may be obtained if two dual-density or one high-density diskette is sent.

Acknowledgement

Research reported in this paper was supported in part by individual operating grant A9228 of the Natural Sciences and Engineering Research Council of Canada.

References

American National Standards Institute (1979). American national standard for writing abstracts (ANSI Z39.14-1979).

Burgin, R., & Dillon, M. (1992). Improving disambiguation in FASIT. Journal of the American Society for Information Science, 43 (2), 101-114.

Craven, T.C. (1988). Text network display editing with special reference to the production of customized abstracts. Canadian journal of information science, 13 (1/2), 59-68.

Craven, T.C. (1991a). Use of words and phrases from full text in abstracts. Journal of information science, 16, 351-358.

Craven, T.C. (1991b). Algorithms for graphic display of sentence dependency structures. Information processing and management, 27 (6), 603-613.

Craven, T.C. (1993). A computer-aided abstracting tool kit. Canadian journal of information science, 18 (2), 19-31.

Craven, T.C. (In press). Presentation of repeated phrases in a computer-assisted abstracting tool kit. Information processing and management

Endres-Niggemeyer, B. (1994). Summarizing text for intelligent communication: results of the Dagstuhl Seminar. Knowledge organization, 21 (4), 213-223.

Endres-Niggemeyer, B, Waumans, W, & Yamashita, H (1991). Modelling summary writing by introspection: a small-scale demonstrative study. Text, 11 (4), 523-552.

Fagan, J.L. (1989). The effectiveness of a nonsyntactic approach to automatic phrase indexing for document retrieval. Journal of the American Society for Information Science, 40 (2), 115-132.

Jones, L.P., Gassie, E.W., & Radhakrishnan, S. (1990). INDEX: the statistical basis for an automatic conceptual phrase-indexing system. Journal of the American Society for Information Science, 41 (2), 87-97.

Kaplan, R.B., Cantor, S., Hagstrom, C., Kamhi-Stein, L.D., Shiotani, Y., & Zimmerman, C.B. (1994). On abstract writing. Text, 14 (3), 401-426.

Kozma, R.B. (1991). The impact of computer-based tools and embedded prompts on writing processes and products of novice and advanced college writers. Cognition and instruction, 8 (1), 1-27.

Nielsen, J. (1994). Estimating the number of subjects needed for a thinking aloud test. International journal of human-computer studies, 41, 385-397.

Paice, C. (1990). Constructing literature abstracts by computer: techniques and prospects. Information processing and management, 26 (1), 171-186.

Paijmans, H. (1993). Comparing the document representations of two IR systems: CLARIT and TOPIC. Journal of the American Society for Information Science, 44 (7), 383-392.

Salager-Meyer, F. (1991). Medical English abstracts: how well are they structured?. Journal of the American Society for Information Science, 42 (7), 528-531.

Simpson, E.H. (1949). Measurement of diversity. Nature, 163, 688.

R & TH Inc (1994). WinSpell version 3.08: the Windows spelling supervisor. Richardson, Texas: R & TH Inc.

SPSS Inc. (1993). SPSS for Windows, Release 6.0 (Jun 17 1993).

Tibbo, H.R. (1992). Abstracting across the disciplines: a content analysis of abstracts from the natural sciences, the social sciences, and the humanities with implications for abstracting standards and online information retrieval. Library and information science research, 14 (1), 31-56.

Van Dijk, T., & Kintsch, W. (1985). Cognitive psychology and discourse: recalling and summarizing stories. In H. Singer, &R.B. Ruddell (Ed.), Theoretical models and processes of reading, third edition (pp. 794-812). Newark, Delaware: International Reading Association.

Table 1: Variables

AUABDYAD Proportion of 2-word sequences in abstract found in author abstract
AUABWORD Proportion of words in abstract found in author abstract
BYTES Length of abstract in bytes
COMPEDIT "How familiar are you with using a computer for editing text?"
EXABOTHE "How much experience have you had in writing abstracts of documents written by other people?"
FULLDYAD Proportion of 2-word sequences in abstract found in full text
FULLWORD Proportion of words in abstract found in full text
GOODABST "How good do you think your abstract is?"
MISSPDEN Proportion of misspelled words
PHRADYAD Number of 2-word sequences in abstract found in phrase display
SEC.REMA Seconds remaining at completion
SIMPSONL Simpson's l of abstract words
SOFTEASY "How easy was it for you to use the software?"
SUGGESTI Type of information in Suggestions window
SUGGUSEF "How useful did you find the information in the Suggestions window?"
WINDOWS "How familiar are you with the MicroSoft Windows environment?"

Table 2: Correlations

Variables correlated Spearman correlation Significance
SUGGESTI - SUGGUSEF 0.1076 0.551
SUGGESTI - GOODABST 0.1238 0.507
SUGGESTI - PHRADYAD 0.3005 0.079
FULLDYAD - WINDOWS 0.2632 0.139
FULLDYAD - COMPEDIT 0.1243 0.491
FULLDYAD - EXABOTHE 0.1163 0.506
FULLWORD-COMPEDIT (document 1) 0.2903 (0.4868) 0.101 (0.040)
FULLWORD - WINDOWS 0.5987 0.007
AUABDYAD - EXABOTHE 0.0650 0.711
AUABDYAD - WINDOWS (document 1) 0.2292 (0.5725) 0.200 (0.010)
AUABWORD - EXABOTHE 0.0681 0.697
AUABWORD - WINDOWS -0.1394 0.439
SOFTEASY - WINDOWS 0.2764 0.126
SOFTEASY - COMPEDIT 0.0577 0.758
EXABOTHE - GOODABST 0.2251 0.223
EXABOTHE - BYTES -0.1070 0.540
EXABOTHE - SIMPSONL -0.0265 0.880
MISSPDEN - COMPEDIT -0.4149 0.016
MISSPDEN - FULLDYAD -0.4886 0.003
MISSPDEN - SEC.REMA -0.1283 0.463

Table 3: "How useful did you find the information in the Suggestions window?"

Phrases subjects Keywords subjects
"not at all useful" 1 5
"not very useful" 9 5
"quite useful" 4 5
"very useful" 2 2
other 2 0

© 1996, American Society for Information Science. Permission to copy and distribute this document is hereby granted provided that this copyright notice is retained on all copies and that copies are not altered.